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Abstract: The recent advent of high-throughput approaches has revealed widespread 
transcription of the human genome, leading to a new appreciation of transcription 
regulation, especially from noncoding regions. Distinct from most coding and small 
noncoding RNAs, long noncoding RNAs (IncRNAs) are generally expressed at low levels, 
are less conserved and lack protein-coding capacity. These intrinsic features of IncRNAs 
have not only hampered their full annotation in the past several years, but have also 
generated controversy concerning whether many or most of these IncRNAs are simply the 
result of transcriptional noise. Here, we assess these intrinsic features that have challenged 
IncRNA discovery and further summarize recent progress in IncRNA discovery with 
integrated methodologies, from which new lessons and insights can be derived to achieve 
better characterization of IncRNA expression regulation. Full annotation of IncRNA 
repertoires and the implications of such annotation will provide a fundamental basis for 
comprehensive understanding of pervasive functions of IncRNAs in biological regulation. 
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1. Introduction 

It is well known that DNA is transcribed into messenger RNA (mRNA), which is then translated to 
protein(s) with the help of housekeeping noncoding RNAs (ncRNAs) such as transfer RNAs (tRNAs) 
and ribosomal RNAs (rRNAs). Messenger RNAs serve as intermediate carriers, forwarding genetic 
information (as coding genes) from DNA to protein. Characterization of coding genes and their protein 
products has been of great importance in our goal to understand gene expression regulation. While 
early expectations were to find about 100,000 genes in the human genome, the current estimate stands 
at 20,000-25,000 [1] genes after the first draft of the human genome was released in 2001 [2]. We have 
now learned that only about 2% of the human genome encodes protein sequences [1], much of the rest 
of the noncoding segments used to be considered as "junk" or "dark matter" [3,4], despite evidence of 
their participation in gene expression regulation at multiple levels. Housekeeping ncRNAs with known 
functions have been studied for many decades. For example, they play key roles in translation (tRNA 
and rRNA), splicing (snRNA), and RNA modification (snoRNA). The advent of state-of-the-art deep 
sequencing technology has revealed that most of the human genome is pervasively transcribed [5,6], 
indicating a rich pool for ncRNAs besides the aforementioned well characterized molecules. 

New small regulatory ncRNAs were first identified by exogenous RNA interference in plants and 
nematodes, and later found to exist endogenously. These small ncRNAs, including but not limited to 
microRNAs (about 22 nt long), function as posttranscriptional repressors [7]. Through a combination of 
size selected high-throughput sequencing and computational approaches, a very large number of small 
ncRNAs have now been identified and predicted in genomes, and their evolutionary conservation and 
structural stability have been extensively analyzed [8]. Generally speaking, the computational pipeline 
for small ncRNA prediction with high-throughput experiments is now relatively mature [9,10], and 
over 1600 precursors and 2042 mature miRNAs have been reported in the human genome (miRBase 
19, released date August 2012). 

Beyond the small regulatory ncRNAs, the multifaceted transcriptome has become even more complex 
with the discovery of the pervasive transcription of long noncoding RNAs (IncRNAs, at least 200 nt 
long). LncRNAs are known to play important roles in both biological and pathological events [11—14], 
including X-chromosome inactivation (Xist) [15], genomic imprinting (Air, Kcnqlotl) [16,17] and 
nuclear trafficking (NORN) [18]. The application of tiling arrays allowed the discovery of additional 
IncRNAs, including the well-characterized HOT AIR [19], NEAT I and MALAT1 [20]. These IncRNAs 
are involved in trans-acting gene regulation (HOTAIR) [19], providing a structural scaffold in nuclear 
architectures (NEAT1) [21-24] and alternative splicing regulation (MALAT1) [25], although the effects 
might be very subtle as indicated by discrepancies in cell cultures [25] and mouse models [26]. 
Detailed studies of these abundant IncRNAs have served as road maps for the functional characterization 
of other IncRNAs. Very recently, the new finding and understanding of pervasive transcription from 
the "dark matter" attracted our attention to an integrated annotation of IncRNAs from transcriptomes. 
The existence of thousands of IncRNAs from intergenic regions (large intergenic noncoding RNA, 
lincRNA) has been inferred from massive high-throughput sequencing data including histone modification 
landscapes (chromatin signatures) in both mouse [27] and human [28]. In addition, functional investigations 
of certain IncRNAs further revealed additional roles of these molecules in gene expression regulation, 
from controlling chromatin complexity [29], to acting as competing endogenous RNAs [30], to performing 
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enhancer-like functions [31], and to maintaining pluripotency [32] and embryogenesis [33]. In addition, 
non-polyadenylated RNA enrichment from human transcriptomes, followed by computational analysis, 
revealed that some excised introns can stably accumulate as IncRNAs [34]. In some cases, intron- 
derived IncRNAs are capped by snoRNAs at both ends to protect intronic sequences from degradation 
after splicing, leading to the formation of a new class of IncRNAs (sno-lncRNAs) [35]. 

As up to 70% of the human genome can be transcribed [36] and only about 2% of the human 
genome encodes protein coding genes including UTRs [1], it is not surprising that the majority of 
IncRNAs were previously classified as "junk sequences" or "dark matter". Some of the best- 
characterized IncRNAs are generally highly expressed and conserved across species, but these features 
are more the exception than the rule and cannot be generalized to thousands of other IncRNAs 
identified by large-scale screening. The latter are generally expressed at a low level [37] and are less 
conserved [38], which have impeded their discovery and functional studies. In this review, we assess 
issues that have challenged IncRNA discovery in the past, and also highlight recent experimental and 
computational designs that have facilitated IncRNA identification and characterization. These 
advances not only shed light on IncRNA characterization but also reveal the complex mechanisms they 
use to regulate other molecules. 

2. Challenges for LncRNA Discovery 

In the first decade of this century, whole genome sequencing revealed approximately 20,000 protein 
coding genes in humans, which is comparable to estimates in the fly and worm, although humans 
exhibit much more complexity through alternative splicing [1,2]. With the rapid development of high- 
throughput technologies, growing lines of evidence have indicated that genomes are pervasively 
transcribed, with many previously ignored portions of the genome transcribed as IncRNAs [6,36,38] 
(Figure la). However, several intrinsic features of IncRNAs have posed challenges for their discovery 
as well as their functional study, as discussed below. 

2.1. LncRNAs in General Are Expressed at Low Levels in vivo, but with High Tissue-Specificity 

RNA-seq (deep sequencing from reverse-transcribed RNAs) datasets revealed that the human 
genome is pervasively transcribed [5]. However, the extent of this pervasive transcription has been 
disputed [39,40]. The controversy has been partially due to different datasets and computational 
approaches [6] that were applied to individual analyses, but also to the nature of the low expression in 
most noncoding regions in genomes. For example, many such transcripts from intergenic or intronic 
regions were detected at very low levels by various technologies [41]. In addition, the median 
expression level of lincRNAs was approximately one-third of that of the coding ones in the mouse [42] 
and about 10-fold lower than of coding genes in humans [28]. Moreover, the recent Encyclopedia of 
DNA Element (ENCODE) project released a variety of transcriptomes of RNA repertoires from 15 
human cell lines. The complete annotation of these transcriptomes suggested that IncRNAs have lower 
expression levels than coding RNAs [36]. In particular, 80% of detected IncRNAs exist in <1 copy per 
cell, compared with only 25% of coding RNAs in examined cell lines [36]. Taken together, the nature 
of low expression of IncRNAs makes it difficult for their discovery, precise annotation, and subsequent 
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functional studies. Nonetheless, the expression of a few IncRNAs is comparable or even higher than 
coding ones in certain cell lines (e.g., HI 9 in NHEK cells [36] and sno-lncRNAs in hES cells [35]). 

Accumulated results suggested that most IncRNAs exhibit a low level of expression but high tissue- 
/cell-specific patterns [37,38,43]. About 78% of human lincRNAs are tissue-specific, compared with 
about 19% for protein coding genes [28]. Moreover, the complete transcriptome analyses from 15 
human cell lines in the ENCODE project showed that 29% of all detected IncRNAs are only from one 
cell line and only 10% are expressed in all cell lines. In contrast, 7% of expressed coding RNAs were 
only detected from one cell line, but 53% of them were expressed in all cell lines [36]. These observations 
indicate that their tissue-specific expression patterns make the identification and characterization of 
these IncRNAs quite challenging if only a small portfolio of tissues/cell lines are chosen for analyses. 

2.2. Evolutionary Conservation ofLncRNAs on Average Is Relatively Lower than That of Coding RNAs 

Homologous sequence comparison is an efficient method for identifying genes that exhibit similar 
functions between species and for discovering novel coding regions [44], however, it is not an 
effective way for non-protein coding sequences, because they are less conserved. For example, only a 
small portion (<5%) of noncoding sequences are conserved between human and mouse [5,45]. Recent 
transcriptome analyses by a variety of RNA-seq experiments indicated the existence of thousands of 
lowly conserved IncRNAs from zebrafish [46] genome to mouse [27] and human [28] genomes. Only 
29 out of 550 lincNRAs in zebrafish have detectable sequence similarity with putative mammalian 
orthologs, and similar sequences are typically restricted to a single short region of high conservation [46]. 
Thus, although IncRNAs are less conserved across species than protein coding genes, they still on 
average represent somewhat higher levels of conservation than random regions or introns [42]. 

Usually, evolutionary constraint can be estimated from the nucleotide substitution rate in functional 
sequences [47]. Nucleotide substitutions in ncRNAs are on average about 90-95%, compared with 
about 10% in coding genes. This is reasonable, as nucleotide substitutions tend to be less deleterious in 
noncoding sequences than in coding ones [47]. A limited phylogenetic range of ncRNAs can be 
explained as emerging or declining rapidly within particular lineages [48]. For instance, it has been 
suggested that about one third of IncRNAs have arisen within the primate lineage only [38]. 

The aforementioned studies suggested that low evolutionary conservation might be a natural feature 
of noncoding transcripts, which is consistent with their rather poor genome-wide annotations in early 
studies [1,2,4]. However, considering the relatively higher species divergence, it is possible to identify 
more novel IncRNAs from different species/evolutionary lineages. Their generally low expression 
level together with poor conservation initially led researchers to conclude that transcripts from 
noncoding segments may represent transcriptional noise [49]. However, lack of conservation does not 
mean lack of function [50]. For example, human NEAT1 RNA and its mouse homolog Men <s//?have 
low sequence similarity [20] but are functionally conserved [21-24]. Interestingly, some mouse 
pseudogenes, whose ancestors have lost their protein-coding capabilities during rodent evolution, have 
retained their expression and act as competitive noncoding RNAs and function as miRNA-decoys [51]. 
In fact, an increasing number of intensive functional studies have shown that IncRNAs are not just 
ancient relics with little function, but have a variety of roles from epigenetic regulation to pluripotency 
maintenance, and are also highly correlated with some human diseases [52,53]. 
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2.3. Controversial Coding Capacity ofLncRNAs 

Exclusion of protein-encoding capacity is a fundamental requirement for IncRNA definition. In the 
post-genomic era, this capacity can be predicted genome wide using computational approaches, mainly 
based on the length and conservation of ORFs [54]. Cutoffs for minimal ORF length, if applied for 
300 nt (100 amino acids) [55] or even 60 nt (20 amino acids) [56], can still cause controversy. For 
example, some well-characterized IncRNAs, such as Xist [57], have remnants featuring longer-than- 
100-amino acid ORFs. With widespread transcription from a given genome, one can imagine that 
many transcripts identified as IncRNAs may contain ORF remnants, while some coding RNAs may 
contain only small ORFs for short polypeptides. In this case, computational algorithms with multiple 
features incorporated are needed to distinguish truly noncoding RNAs from coding ones. For instance, 
CPC [58] contains six features to not only evaluate the extent and quality of ORFs, but also parse the 
ORF conservation of sequences using BLASTX [59]. Although low conservation of ORFs reflected 
the gene evolution in specific lineages or gene loss in other lineages, studies suggested that most 
putative human ORFs with no cross-species counterparts are likely to be random occurrences [60] and 
this is indeed the case for Xist [57]. A phylogenetic model of codon substitution frequency (phyloCSF) 
metric by orthologous transcript comparison was chosen to distinguish noncoding transcripts from 
coding ones [61], and successfully applied for lincRNA predictions in both mouse [42] and humans [28]. 

Besides computational judgments based on critical features of putative ORFs, several other crucial 
criteria, such as the subcellular localization and the accessibility to the translation machinery, could 
also be used to evaluate whether a given transcript is a true IncRNA or not. RNA transcripts localized 
in the nucleus principally suggest functions that are primarily non-coding. This can be estimated 
experimentally by RNA fractionation from nuclear homogenates [38], as exemplified by NEAT I [21] 
and DEB-T [62], despite the risk of possible nuclear/cytoplasmic leakage during RNA isolation. RNA 
fluorescence in situ hybridization (FISH) is an alternative way to examine the subcellular localization. 
A growing list of well-characterized IncRNAs do localize in the nucleus and within specific subnuclear 
structures as illuminated by RNA FISH and are associated with nuclear proteins as revealed by RNA- 
protein double FISH [63]. Furthermore, ribosome profiling coupled with RNA-seq can provide extra 
insights for the accessibility of a given transcript to the translational machinery [64]. Moreover, 
proteome datasets with a spectrum of all protein products can also be applied to mine the 
existence/non-existence of coding products from tested transcripts. These datasets offer the most direct 
evidence to determine coding capacity of any transcript, although with low resolution and low 
availability. Finally, it cannot be ruled out that some transcripts have a dual nature, acting both as 
ncRNA and producing protein products [65,66]. 

The best way to distinguish between coding and non-coding sequences is to integrate computational 
and experimental approaches that enhance understanding of IncRNA expression regulation and 
biological function in vivo. 
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3. Recent Progress in LncRNA Discovery Using New Strategies 

With technological improvements and the application of integrated methodologies, significant 
progress has been achieved in uncovering new IncRNA molecules. Some of these practical strategies 
can be further applied to achieve new insights into IncRNA functions. 

3.1. Application of Chromatin Signatures to Determine LncRNAs from Intergenic Regions 

Several individual studies have applied a systematic and integrative strategy with multiple 
biological features to identify IncRNAs, mainly in intergenic regions (lincRNAs), first in mouse [27] 
and then in zebrafish [46] and human [28] genomes. Distinguished from other previous trials, a brand 
new feature of "H3K4me3-H3K36me3" chromatin signatures has been utilized in all three species to 
confirm IncRNA promoters using the histone 3 Lys 4 trimethylation (H3K4me3) signature followed by 
identification of actively transcribed IncRNA regions using the histone 3 Lys 36 trimethylation 
(H3K36me3) signature. By differentiating the "H3K4me3-H3K36me3" chromatin signatures of 
IncRNAs from those of known coding genes/microRNAs/endogenous siRNAs, these analyses reliably 
identified IncRNA-expressed genomic sequences, largely in intergenic regions (Figure \b). In addition, 
other stringent criteria have also been taken into account for IncRNA characterization, including the 
identification of poly(A) sites, transcription initiation signals, expression patterns among tissues and 
potential coding capacity. Loss-of-function and gain-of-function of certain conserved IncRNAs 
demonstrated crucial biological roles of IncRNAs in zebrafish [46], indicating functional conservation 
despite limited sequence conservation. More importantly, 7some lincRNAs have been shown to play 
important roles in multiple layers of biological processing, including epigenetic regulation and 
pluripotency maintenance (reviewed by Guttman [14], Rinn [13] and their colleagues). 

3.2. Development of a Non-Polyadenylated RNA Enrichment Strategy to Uncover LncRNAs from Introns 

Most RNA polymerase II transcripts, including mRNAs and IncRNAs, are polyadenylated 
(poly(A)+) at their 3' ends. The application of transcriptome analysis of poly(A)+ RNA by high- 
throughput deep sequencing (mRNA-seq) has revealed a digital map of poly(A)+ transcripts from both 
known and previously un-annotated genes [67]. However, the transcribed portion of the genome is 
more than poly(A)+ transcripts, and there are a large number of non-polyadenylated transcripts 
(poly(A)- transcripts), including ribosomal RNAs (rRNAs) generated by RNA polymerases I and III, 
other small RNAs generated by RNA polymerase III, replication-dependent histone mRNAs [68] and 
some IncRNAs [24,69] transcribed by RNA polymerase II. Depletion of ribosomal RNAs (RiboMinus) 
from total RNA results in both poly(A)+ and poly(A)- transcripts available for deep sequencing 
analysis. This has led to the discovery of many new poly(A)- transcripts when compared with 
poly(A)+ RNA deep sequencing [70,71]. However, rRNA-depletion methods cannot physically 
separate poly(A)- transcripts from poly(A)+ RNAs, thus it is difficult to directly annotate poly(A)- 
transcripts using only the rRNA-depletion method. Recently, a combination of both rRNA and 
poly(A)+ RNA removal was applied to obtain a largely pure population of poly(A)- RNAs for high- 
throughput deep sequencing [34]. This type of poly(A)- RNA-seq of the human cell transcriptomes 
surprisingly revealed many previously un-annotated RNA transcripts, including a new family of 
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IncRNAs from introns in humans [35] (Figure lb). In addition, with the same separation strategy for 
poly(A)- transcripts followed by deep sequencing analyses, additional poly(A)- IncRNAs from intronic 
regions were also found in various human cell lines [38]. Interestingly, RNA fractionation from nuclear 
homogenates also indicated the presence of stable intronic sequence RNAs in X. tropicalis [72]. As 
most IncRNAs are tissue/cell-specific and species-specific, further application of poly(A)- RNA-seq 
for different tissues and species may result in the identification of additional intron-derived IncRNAs. 

What mechanism(s) can generate RNA transcripts without canonical poly(A) tails at their 3' ends? 
For most of the replication-dependent histone pre-mRNAs, evolutionarily conserved stem-loop 
structures in their 3 ' UTRs direct U7 snRNA-mediated 3 ' end formation to stabilize mature mRNAs 
and confer cell cycle dependent regulation of their accumulation [67]. For MALAT1 and Men s//3 
IncRNAs, their 3' end maturation depends on RNase P cleavage [24,69], stabilized by highly conserved 
A- and U-rich motifs that form a triple-helical structure [73,74]. For telomerase RNA in S. pombe, 
incomplete splicing, but not the complete splicing, generates a functional TER1 transcript [75]. 
However, it appears that none of the above mechanisms are applicable to explain the biogenesis of 
IncRNAs from introns, as introns are generally rapidly degraded after splicing. Yin et al. recently 
demonstrated that intron-derived sno-lncRNAs depend on the snoRNA machinery at both ends for their 
processing and on snoRNP complexes at both ends to protect intronic sequences from exonucleotic 
trimming [35]. Genome- wide analysis of poly(A)- RNAs from introns has revealed a large number of 
IncRNAs from intron regions [34,38]; however, only some are capped with snoRNAs. The biogenesis 
of others needs to be further addressed. Finally, in addition to poly(A)- RNA-seq, the development of 
more specific experimental and computational approaches will help to understand other poly(A)- 
lncRNAs matured by RNase P cleavage or incomplete splicing. 

3.3. Determination of Co-Factors to Study LncRNA Biogenesis and Function 

It's now clear that IncRNAs play important roles in a variety of biological processes [13,14,63]. So 
far, only a handful of mechanisms have been identified to explain how IncRNAs function in vivo. 
Accumulated lines of evidence suggest that very often IncRNAs function by recruiting and assembling 
other co-factors, which are usually proteins but possibly other RNAs [51,76,77] or DNAs [78]. 
Clearly, identifying these co-factors is of key importance for understanding IncRNA function. 

The IncRNA Xist is capable of recruiting Polycomb Repressive Complex 2 (PRC2) to remodel 
chromatin modifications [79], resulting in transcriptional inactivation of one X chromosome. Similarly, 
Air and Kcnqlotl IncRNAs achieve transcriptional silencing by recruiting chromatin-remodeling 
complexes during genomic imprinting [80,81]. Indeed, many IncRNAs have been identified to bind 
with PRC2 or other chromatin-modifying complexes for transcriptional repression [32,82]. In addition, 
IncRNAs can also activate gene transcription by binding specific protein factors. For instance, Evf-2 
binds the Dlx-2 protein, which in turn increases the activity of the Dlx-5/6 enhancer [83]. Interestingly, 
one specific IncRNA might play complementary roles in gene expression regulation by selectively 
recruiting either PcG for repression [84] or Trithorax group proteins (TrxG) for activation [85]. 

In addition, IncRNAs can act as molecular scaffolds. For example, telomerase RNA component 
(TERC) acts as a flexible scaffold for bridging protein subunits together to promote telomerase 
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activity [86]. NEAT1 IncRNA is crucial for the integrity of paraspeckles [21-24], and a recent study 
revealed that NEAT I is capable of initiation of paraspeckle de novo formation [87]. 

Moreover, IncRNAs can also function as molecular sponges or decoys to affect gene regulation 
mediated by protein cofactors. For example, Gas5 IncRNA binds the glucocorticoid receptor (GR) to 
compete against the association of the GR with other glucocorticoid response DNA elements, resulting 
in functional repression of GR [88]. PWS region sno-lncRNAs trap Fox family members to alter local 
Fox protein concentration and, subsequently, modulate Fox-regulated alternative splicing events [35]. 
Meanwhile, IncRNAs also act as competing endogenous decoys through their microRNA response 
elements (MREs) to titrate the availability of miRNAs for the other RNA molecules [30,51,76,77]. 
Finally, promoter associated IncRNAs can directly interact with enhancer DNA elements to form 
DNA: RNA triplexes to carry out their regulatory function [78]. 

Taken together, these studies suggest that the functional specificity of a given IncRNA is largely 
dependent on the association with its co-factors, mainly protein partners. Hence, it is important to find 
associated protein co-factors in order to fully understand the functional roles of IncRNAs. While the 
potential binding capacity can be predicted by computationally searching for consensus RNA 
sequences/motifs, direct IncRNA-protein interactomes can also be retrieved from cross-linking 
immuno-precipitation coupled with high-throughput sequencing (CLIP-seq) (Figure \b), or using 
labeled IncRNAs as baits to pull down protein partners. 

How do IncRNAs bind to their protein co-factors? There are a variety of known mechanisms for 
this. Xist contains at least two distinct domains. One is the RepC domain, which is bound by YY1 and 
hnRNP U for the localization; the other one is the RepA domain, which recruits PRC2 for 'm-cis gene 
expression regulation [89,90]. Different from Xist, the PWS region sno-lncRNAs contain multiple 
consensus hexamer motifs for Fox family splicing regulators [91], which leads to the sequestration of 
Fox proteins and subsequently the alteration of patterns of Fox-regulated alternative splicing [35]. 
Interestingly, low evolutionarily conserved IncRNAs have been found associated with the same 
proteins. For example, human NEAT1 and mouse Men s//3 share low primary sequence similarity, but 
both are associated with DBSH proteins [21-24]. This suggests that RNA structure features may 
sometimes play important roles in the determination of their protein partners. Thus, the recent 
application of genome-wide structural analysis that determines ncRNA secondary structure has begun 
to decipher the functional elements of the yeast transcriptome [92]. Similar studies in higher 
eukaryotes will help to reveal structural information and diverse biological insights of IncRNAs, 
possibly with their protein co-factors. 
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Figure 1. Schematic diagram of long noncoding RNA discovery and function analysis 
using genome-wide methods, (a) Genomic locations for long noncoding RNA (IncRNA) 
transcription. Boxes shown as annotated genes and exons. Arrows label the direction of 
transcription, (b) Methodology for IncRNA discovery and functional association with 
proteins. H3K4me3 signature defines transcription initiation. H3K36me3 signature defines 
transcription elongation. Signals of poly(A)+RNA-seq indicate polyadenylated RNAs 
(including most annotated mRNAs and IncRNAs). Signals of poly(A)-RNA-seq indicate 
non-polyadenylated RNAs, including recently identified intronic transcripts. Signals of 
CLIP-seq/RIP-seq reveal the association of RNA transcripts with RNA binding proteins. 
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4. Perspectives 

In the era of post-genomics, elucidating the full spectrum of RNA molecules by a given cell is 
important for understanding gene expression and functional regulation. Largely from the previously 
imagined "dark matter" of the genome, a variety of IncRNAs have been systematically revealed from 
different tissues and species with clear characteristics distinguishing them from coding RNAs. The 
characteristics of IncRNAs are (1) low expression but with a pattern of tissue-specificity, (2) decreased 
conservation in primary sequence but with a likelihood of functional conservation, and (3) restrained 
coding capacity but with a probability of ancestral ORF relics. Transcriptome analyses by high- 
throughput technologies (including tiling arrays and RNA-seq) with high coverage, high sensitivity, 
and high efficiency represent an evolutionary leap in our methodology for IncRNA characterization. 
Recent studies have inspired new insights into the study of IncRNAs, and in turn, these insights have 
prompted further application of novel methodologies for IncRNA study. 

Despite recent and rapid progress in our understanding of IncRNAs, a number of important features 
remain to be further addressed. For example, what are the landscapes of IncRNA expression in specific 
tissues/species and what are the connections of specific expression repertoires with specific 
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tissue/species function? What are the distinct mechanisms for regulation of IncRNAs in specific 
tissues/species? What secondary structures are associated with IncRNA functions? Furthermore, 
existing computational algorithms are not sufficiently robust to deal with these sequence analyses. For 
example, they are less efficient for the accurate alignment of sequencing reads to IncRNAs in 
repetitive regions as well as for the precise transcript alignment of multiple IncRNA molecules from 
the same genomic segments. 

Clearly, the integration of not only new computational pipelines, but also further experimental 
approaches, will be required to further our ability to discover new IncRNAs and how they function in 
gene regulation. 
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